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Abstract 

We prove that under mild positivity assumptions the entropy rate of a hidden 
Markov chain varies analytically as a function of the underlying Markov chain parame- 
ters. A general principle to determine the domain of analyticity is stated. An example 
is given to estimate the radius of convergence for the entropy rate. We then show 
that the positivity assumptions can be relaxed, and examples are given for the relaxed 
conditions. We study a special class of hidden Markov chains in more detail: binary 
hidden Markov chains with an unambiguous symbol, and we give necessary and suffi- 
cient conditions for analyticity of the entropy rate for this case. Finally, we show that 
under the positivity assumptions the hidden Markov chain itself varies analytically, in 
a strong sense, as a function of the underlying Markov chain parameters. 

1 Introduction 

For m,n E Z with m < n,we denote a sequence of symbols i/m, Vm+i, • • • , 2/n by y^. Consider 
a stationary stochastic process Y with a finite set of states {1,2, ■ ■ ■ ,B} and distribution 
piVm). Denote the conditional distributions by piyn+ilUm) ■ The entropy rate of Y is defined 

as 

HiY) = lim -E,(log(p(i/o|2/Z^))), 

n— >oo 

where Ep denotes expectation with respect to the distribution p. 
Let V be a stationary first order Markov chain with 

=p{yi =j\yo = i)- 

It is well known that 

H{Y) = -^p(yo = ^)A(^,J)logA(^,J)• 
*J 

A hidden Markov chain Z (or function of a Markov chain) is a process of the form 
Z = $(V), where $ is a function defined on {1, 2, ■ ■ ■ , B} with values {1, 2, ■ ■ ■ , A}. Often 



a hidden Markov chain is defined as a Markov chain observed in noise. It is well known that 
the two definitions are equivalent (the equivalence is typified by Example 14. 1|) . 

For a hidden Markov chain, H{Z) turns out (see Equation ()2.4j) below) to be the integral 
of a certain function defined on a simplex with respect to a measure due to Blackwell 0. 
However Blackwell's measure is somewhat complicated and the integral formula appears to 
be difficult to evaluate in most cases. 

Recently there has been a rebirth of interest in computing the entropy rate of a hidden 
Markov chain, and many approaches have been adopted to tackle this problem. For instance, 
some researchers have used Blackwell's measure to bound the entropy rate and others 
introduced a variation [Zj on bounds due to [2j. 

In a new direction, jT^lElES have studied the variation of the entropy rate as parameters 
of the underlying Markov chain vary. These works motivated us to consider the general 
question of whether the entropy rate of a hidden Markov chain is smooth, or even analytic [2H1 
1^ . as a function of the underlying parameters. Indeed, this is true under mild positivity 
assumptions: 

Theorem 1.1. Suppose that the entries of A are analytically parameterized by a real variable 
vector e. If at e = Eq, 

1. For all a, there is at least one j with $(j) = a such that the j-th column of A is strictly 
positive - and - 

2. Every column of A is either all zero or strictly positive, 
then H{Z) is a real analytic function of e at Eq. 

Note that this theorem holds if all the entries of A are positive. The more general form 
of our hypotheses is very important (see Example 14.11) . 

Real analyticity at a point is important because it means that the function can be ex- 
pressed as a convergent power series in a neighborhood of the point. The power series can 
be used to approximate or estimate the function. For convenience of the reader, we recall 
some basic concepts of analyticity in Section IHl 

Several authors have observed that the entropy rate of a hidden Markov chain can be 
viewed as the top Lyapunov exponent of a random matrix product jTUl HH E] • Results in 
m 1^ 1221 12S] show that under certain conditions the top Lyapunov exponent of a random 
matrix product varies analytically as either the underlying Markov process varies analytically 
or as the matrix entries vary analytically, but not both. However, when regarding the entropy 
rate as a Lyapunov exponent of a random matrix product, the matrix entries depend on the 
underlying Markov process. So, the results from Lyapunov theory do not appear to apply 
directly. Nevertheless, much of the main idea of our proof of Theorem 11.11 is essentially 
contained in Peres j22|- In contrast to Peres' proof, we do not use the language of Lyapunov 
exponents and we use only basic complex analysis and no functional analysis. Also the 
hypotheses in [22| do not carry over to our setting. To the best of our knowledge the 
statement and proof of Theorem 11.11 has not appeared in the literature. For analyticity of 
certain other statistical quantities, see also related work in the area of statistical physics 
in OilllllEl. 
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After discussing background in Sections |21 and 121 we prove Theorem II .11 in Section^ As 
an example, we show that the entropy rate of a hidden Markov chain obtained by observing 
a binary Markov chains in binary symmetric noise, with noise parameter e, is analytic at 
any e = Eq > 0, provided that the Markov transition probabilities are all positive. 

In Section we infer from the proof of Theorem 11.11 a general principle to determine 
a domain of analyticity for the entropy rate. We apply this to the case of hidden Markov 
chains obtained from binary Markov chains in binary symmetric noise to find a lower bound 
on the radius of convergence of a power series in e a.t Eq = 0. Given the recent results of |33j . 
which compute the derivatives of all orders at Eq = 0, this gives an explicit power series for 
entropy rate near Eq = 0. 

In Section ^ we show how to relax the conditions of Theorem 11.11 and apply this to give 
more examples where the entropy rate is analytic. 

The entropy rate can fail to be analytic. In Section [7| we give examples and then give a 
complete set of necessary and sufficient conditions for analyticity in the special case of binary 
hidden Markov chains with an unambiguous symbol, i.e., a symbol which can be produced 
by only one symbol of the Markov chain. 

Finally in Section |H1 we resort to more advanced techniques to prove a stronger version. 
Theorem 18.11 of Theorem 11.11 This result gives a sense in which the hidden Markov chain 
itself varies analytically with e. The proof of this result requires some measure theory and 
functional analysis, along with ideas from equilibrium states j21], which are reviewed in 
Appendix O Our first proof of Theorem 11.11 was derived as a consequence of Theorem 18.11 
It also follows from Theorem 18.11 that, in principle, many statistical properties in addition 
to entropy rate vary analytically. 

Most results of this paper were first announced in [H]. 

2 Iteration on the Simplex 

Let W be the simplex, comprising the vectors 



and let Wa be all w & W with Wi = for ^ a. Let W'^ denote the complex version of 
W, i.e., W'^ denotes the complex simplex comprising the vectors 



and let denote the complex version of Wa, i.e., consists of all w G W with Wi = 
for 7^ a. For a G A, let Aq denote the B x B matrix such that Aa{i,j) = A(i,j) 
for j with = a, and Aa{i,j) = otherwise. For a E A, define the scalar-valued and 
vector-valued functions and on W by 



{w = (wi, W2, - ■ ■ , Wb) G M'^ : > 0, ^ = 1}, 




Taiw) = wAal, 



and 



fa{w) = wAa/raiw). 
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Note that fa defines the action of the matrix on the simplex W. For any fixed n and 
z\, define 

Xi = Xi{z\) = p{yi = ■ \Zi, Zi_i, ■ • • , Z-n), (2.1) 

(here ■ represent the states of the Markov chain Y,) then from Blackwell {xj} satisfies 
the random dynamical iteration 

Xi+1 = fz,+,{x^), (2.2) 

starting with 

x_n-i=p{y-n-i = ■)■ (2.3) 
We remark that Blackwell showed that 

H{Z) = - I Y,^a{w)\ogra{w)dQ{w), (2.4) 

where Q, known as Blackwell's measure, is the limiting probability distribution, as n ^ oo, 
of {xq} on W. However, we do not use Blackwell's measure explicitly in this paper. 

Next, we consider two metrics on a compact subset S of the interior of a subsimplex W 
of W. Without loss of generality, we assume that W consists of all points from W with the 
last B — k coordinates equal to 0. The Euclidean metric on S is defined as usual, namely 
for u,v & S, 

u = (mi, ■ ■ ,ub),v = {vi,V2, - ■ ■ , vb) e S, 

we have 



d-E,{u,v) = a/ {Ui - + {U2 - H h (Wfc - Vk^. 

The Hilbert metric ^22, c^b on 5* is defined as follows: 

dsiu, v) = max log '^^^^ 



ijtj<k \Vi/Vj 



The following result is well known (for instance, see P). For completeness, we give a 
detailed proof in Appendix 1X1 

Proposition 2.1. rf^ and ds are equivalent (denoted by ds ~ dB) on any compact subset S 
of the interior of a subsimplex W ofW, i.e., there are positive constants Ci < C2 such that 
for any two points u,v & S , 

CidB{u,v) < dsiu^v) < C2dB{u,v). 

Proposition 2.2. Assume that at sq, A satisfies conditions 1 and 2 of Theorem \l.l\ Then 
for sufficiently large n and all choices 0/ ai, . . . , a„ and h, the mapping fa„ o fa„_i ° ■ ■ ■ ° fai 
is a contraction mapping under the Euclidean metric on Wfy. 

Proof. Wb = fb{W) is a compact subset of the interior of some subsimplex of Wb, this 
subsimplex corresponds to column indices j such that $(j) = b and the j-th column is 
strictly positive. Therefore one can define the Hilbert metric accordingly on Wb- Each 
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is a contraction mapping on each Wb under the Hilbert metric ^7\; namely there exists 
< p < 1 such that for any a and b, and for any two points u,v E Wb, 

Thus, for any choices of 02, as, ■ ■ ■ , a„, we have 

By Proposition 12.11 there exists a positive constant C such that 

Let L be a universal Lipschitz constant for any fc : Wb W^. with respect to the Euclidean 
metric. Choose n large enough such that Cp^~^ < 1/L. So, for sufficiently large n, any 
composition of the form /a„ o ■ ■ ■ o fi^^ is a Euclidean contraction on Wj). 

□ 

Remark 2.3. Using a slightly modified proof, one can show that for sufficiently large n, any 
composition of the form /^^ o ■ ■ ■ o /^^ is a Euclidean contraction on the whole simplex W . 

3 Brief background on analyticity 

In this section, we briefly review the basics in complex analysis for the purpose of this paper. 
For more details, we refer to EH] • 

A real (or complex) function of several variables is analytic at a given point if it admits 
a convergent Taylor series representation in a real (or complex) neighborhood of the given 
point. We say that it is real (or complex) analytic in a neighborhood if it is real (or complex) 
analytic at each point of the neighborhood. 

The relationship between real and complex analytic functions is as follows: 1) Any real 
analytic function can be extended to a complex analytic function on some complex neigh- 
borhood; 2) Any real function obtained by restricting a complex analytic function from a 
complex neighborhood to a real neighborhood is a real analytic function. 

The main fact regarding analytic functions used in this paper is that the uniform limit 
of a sequence of complex analytic functions on a flxed complex neighborhood is complex 
analytic. The analogous statement does not hold (in fact, fails dramatically!) for real 
analytic functions. 

As an example of a real- valued parametrization of a matrix, consider: 

2e e l-3£ " 

e 1 — £ — sin(£) sin(£:) 
1 — — 

Denote the states of A by {1,2,3} and let $(1) = $(2) = 0, $(3) = 1. Each entry of A 
is a real analytic function of e at any given point e = Eq. For Eq > and sufficiently small, 
A is stochastic (i.e., each row sums to 1 and each entry is nonnegative) and in fact strictly 



A{e) = 
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positive (i.e., each entry is positive). According to Theorem II. ![ for such values of Eq, the 
entropy rate of the hidden Markov chain defined by A{e) and $ is real analytic as a function 
of e at Eq. . 

While we typically think of analytic parametrizations as having the "look" of the pre- 
ceding example, there is a conceptually simpler parametrization - namely, parameterize an 
n X n matrix A by its entries themselves; if A is required to be stochastic, we choose the 
parameters to be any set of — 1 entries in each row (so, the real variable vector is an 
n{n — l)-tuple). Clearly, for analyticity it does not matter which entries are chosen. We call 
this the natural parametrization. 

Suppose that H{Z) is analytic with respect to this parametrization. Then, H{Z) viewed 
as a function of any other analytic parametrization of the entries of A is the composition 
of two analytic functions and thus must be analytic. We thus have that the following two 
statements are equivalent. 

• H{Z) is analytic with respect to the natural parameterization. 

• H{Z) is analytic with respect to any analytic parameterization. 
We shall use this implicitly through the paper. 



4 Proof of Theorem 11.11 

Notation: We rewrite A, Z, faix), p{zo\zZlo) with parameter vector e as A^, Z^, f^{x) and 
P^izo\zZlo), respectively. We use the notation Wa to mean fa°iW). Let Qc = ^^c('") denote 
the set of points of distance at most r from Eq in the complex parameter space C™. Let 
Nfy = Ni,{R) denote the set of all points in of distance at most R from Wb. 

We first prove that for some r > 0, \ogp^{zo\zzlo) can be extended to a complex analytic 
function of G ^c{r) and that | logp'^^zolzZlo) ~'^ogp'^{zo\zzlo)\ decays exponentially fast in 
n, when = uniformly in e G flc{r). 

Note that for each a, b, fl{w) is a rational function of the entries of A^ and w G W^. So, 
by viewing the real vector variables e and w as complex vector variables, we can naturally 
extend faiw) to a complex- valued function of complex vector variables e and w. Since A 
satisfies conditions 1 and 2 at for sufficiently small r and R, the denominator of f^iw) is 
nonzero for e in flc{r) and w in Nb{R). Thus, /^(w) is a complex analytic function of {e,w) 
in the neighborhood flci"^) x Nb{R). 

Assuming conditions 1 and 2, we claim that A has an isolated (in modulus) maximum 
eigenvalue 1 at Eq. To see this, we apply Perron- Frobenius theory j22I as follows. By 
permuting the indices, we can express: 

U 
V 

where U is the submatrix corresponding to indices with positive columns. The nonzero 
eigenvalues of A are the same as the eigenvalues of U, which is a positive stochastic matrix. 
Such a matrix has isolated (in modulus) maximum eigenvalue 1. 
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The stationary distribution p'^{y = ■ ) (the eigenvector corresponding to the maximum 
eigenvalue 1) is a rational function of the entries of A^, since it is a solution of the equation 
vA^ = V. So, in the same way as for fa{w) we can naturally extend = ■ ) to a complex 
analytic function = ■ ) on fic- 

Extending for each i, we define 

xi = x\{z'_^) = p'ivi = ■ |zi„), (4.5) 
by iterating the following complexified random dynamical system (extending ()2.2|) and ()2.3|) ): 

4+1 = flM)^ (4-6) 

starting with 

xl„_i =/(i/-„-i = ■ )• (4.7) 

By Proposition ESI for sufficiently large n, we can replace the set of mappings {fa°} with 
the set {/^° o o • ■ ■ o /^o} and then assume that each is a Euclidean contraction 

on each Wb with contraction coefficient p < 1. Since Wb is compact and the definition of 
p-contraction is given by strict inequality, we can choose r and R sufficiently small such that 

is a Euclidean p — contraction on each Nb{R), e G flc{r). (4.8) 
Further, we claim that by choosing r still smaller, if necessary, 

G UbNb{R), for all i,n and all choices of e G fic(^)- (4-9) 

To see this, fixing p and R, choose r so small that 

l/:>) - (x)| < R{1 -p), xe UbWb, e G Qcir) (4.10) 

and 

!/(■) -/"(■)! <^(1-P), eendr). (4.11) 
Now consider the difference 

= fl,M) - ft.(^?) = - fl^S4) + fl^Mf) - fllS4)- (4.12) 

Then by (gS)) , and (HHH), and (^3^ . for i > -n - 1, we have 

Ki-x£i|<p|xf-xf|+i?(l-p). 

So, 

and thus for all i, we have xIj^^ G UbNb{R), yielding ()4.9|) . Each is the composition of 
analytic functions on flc{r) and so is complex analytic on Qc{r). 

For < ni, n2 < oo, we say two sequences and {-S^^^} have a common tail if there 

exists n > with n < ni,n2 such that = Zj, —n < i < (denoted by ~ ^-n2)- 



7 



Let 

xf = xfizi^J = p%y, = ■ 

xf = xf(zL„J = p%y^ = ■ \z\J. 

Then we have 

From ()4.8j) and ()4.9j) . it follows that there exists a positive constant L independent of ni and 
n2 such that 

l4-4l<^P"- (4-13) 

Naturally 

/(^ok:;i)= Yl E^'Vi,yo)/Vik:;i). (4.14) 

Then, there is a positive constant L', independent of ni,n2, such that 

\/izo\zZ'J -/(^oir^JI < L'p\ (4.15) 

Since satisfies conditions 1 and 2, ^'^(zol^Zn) is bounded away from 0, uniformly in 
e E fic) IT- and choices of zZn', thus there is a positive constant L", independent of ni,n2, 
such that 

I \og/izo\zzl) - \og/izo\zZlJ\ < L"pr (4.16) 
Since for each y G {1, . . . , -B}, p'^(y) is analytic, from 

$(j/) = 2 

we deduce that p'^{z) is analytic. Furthermore since p^{zq\zZ]t) is analytic on f2c, we conclude 
J3^(-z°„) is analytic on f^c- 
Choose a so that 

1 < cr < 1/p. 
If r and i? are chosen sufficiently small, then 

(2:0 1 -21^) I < cr, £ G f2c('") and all sequences z (4-17) 



and 



^|p^>o)| <a, £Gfic(r). (4.18) 

Then we have 

E l/(^°n-i)l= E l/>=ti)/(^ok:Li)l < E l/(^=n-i)lEl/(^ok:Li)l<aEl/(^-)l' 
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implying 



E l/(^Vi)l<^"^'- (4.19) 
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Let 



and 

Pi = p5 < 1, 

then we have 



J2 p'{z\-^){log/{zo\zZl_^)-\og/{zo\zZl))\ < a^L"pl, 



2° 



here the latter inequahty follows from ()4.16|) and ()4.19|) . Thus, for m > n, 



Pi 



This establishes the uniform convergence of H^^{Z) to a limit H^{Z). By Theorem 2.4.1 
of the uniform limit of complex analytic functions on a fixed complex neighborhood is 
analytic on that neighborhood, and so H^{Z) is analytic on flc- 

For real e, H^{Z) coincides with the entropy rate function H{Z^, and so Theorem ll.il 
follows. 

Example 4.1. Consider a binary symmetric channel with crossover probability e. Let {Yn} 
be the input Markov chain with the transition matrix 



n 



(4.20) 



TTlO VTii 

At time n the channel can be characterized by the following equation 

Zn = Yn (B En, 

where © denotes binary addition. En denotes the i.i.d. binary noise with pe{0) = 1 — e and 
Pe{^) = £, and Zn denotes the corrupted output. Then {Yn,En) is jointly Markov, so {Zn} 
is a hidden Markov chain with the corresponding 

7roo(l-£^) TTooe: noi{l - e) ttoiE 

7roo(l-£^) TTooe: 7roi(l-£) ttqiE 

7rio(l-£:) TTioE 7rii(l-£) nu£ 

7rio(l-£:) TTioe 7rii(l-£:) nue 

here, $ maps states 1 and 4 to and maps states 2 and 3 to 1. This class of hidden Markov 
chains has been studied extensively (e.g., |llj, |19j). 

By Theorem 11.11 when e and vTjj's are positive, the entropy rate H{Z) is analytic as a 
function of e and tTj^'s. This still holds when e = and the tTj^'s are positive, because in this 
case, we have 

TTOO TToi 
^ ^ TToo TToi 
VTio TTii 
VTio TTii 
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5 Domain of Analyticity 



Suppose A is analytically parameterized by a vector variable e*, and Conditions 1 and 2 in 
Theorem 11.11 are satisfied at e = eo- In principle, the proof of Theorem 11.11 determines a 
neighborhood f2c(r) of on which the entropy rate is analytic. Specifically, if one can find 
p,r and R such that all of the following hold, then the entropy rate is analytic on Qc{r). 

1. Find p such that each /^o is a Euclidean p-contraction on each Wb- Then choose 
positive r, R such that for all e G fi(c(r), each is a Euclidean p-contraction on each 
Nh{R) (see gSD). 

2. Next find r smaller (if necessary) such that for all e G Qc{r), the image of the stationary 
vector of A*^, under any composition of the mappings {fa}, stays within UbNf,{R) (see 
fl4.9p ). Note that the argument in the proof shows that this holds if fl4.10|) and 1)4.111) 
hold. 

3. Finally, find r, R such that the sum of the absolute values of the complexified con- 
ditional probabilities, conditioned on any given past symbol sequence, is < 1/p, and 
similarly for the sum of the absolute values of the complexified stationary probabilities 
(see fgTTIl and KT^ ). 

In fact, the proof shows that one can always find such p,r,R, but in condition 1 above 
one may need to replace f^s by all ra-fold compositions of the f^s, for some n. 

Recall from Example 14.11 the family of hidden Markov chains determined by passing 
a binary Markov chain through a binary symmetric channel with cross-over probability 
e. Recall that H{Z^) is an analytic function of e at e = when the Markov transition 
probabilities are all positive. We shall determine a complex neighborhood of such that the 
entropy rate, as a function of e, is analytic on this neighborhood. 

Let Un = piVn = 0|-2") and f„ = p(?/„ = ll^;"). For Zn+i = 1 we have 

e{7looUn + TTiot'n) 

Un+1 - 



Vn+1 



e(vr00M„ + TTloVn) + (1 " e){TToiUn + TTuVn) ' 

(1 - e){TCoiUn + rcnVn) 

£^(vrooMn + T^loVn) + (1 " £^)(7roiMn + T^uVfi)' 

Since + f„ = 1, function of m„; let gi denote this function. 

For Zn+i = we have 

(1 - e){7lQ0Un + TTloVn) 

Un+1 - 



Vn+1 



(1 - e){nooUn + TTiQVn) + e{TToiUn + TTuVn) ' 

£(vroiM„ + rcuVn) 



[1 - e){'n:ooUn + nioVn) + e{7ToiUn + TTnVn)' 

Again, Un+i is a function of let go denote this function. 
And for the conditional probability, we have 

p{zn = 0|2;""^) = ((1 - £:)7roo + enoi)un + ((1 - £:)7rio + £'n-ii)vn- 
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Since Un + Vn = 1, p{zn = 0|z" ^) is a function of m„; let ro denote this function. And 

p{zn = ll^^i""^) = (evToo + (1 - £)7roi)M„ + (evTiQ + (1 - £)7rii)t;„. 

Again, = is a function of m„; let ri denote this function. 

Note that ^^o, 91, f^o, ri are all implicitly parameterized by e. The stationary vector (ttq, tti) 
of Y, which doesn't depend on e, is equal to (7rio/(7rio + vroi), ttqi/ (ttiq + vroi)). 

We shall choose p with 0<p<l,r>0 and R> such that for all e with < r 

1. Qo and Qi are p-contraction mappings on /^-neighborhoods of and 1 in the complex 
plane, 

2. the set of all {ga„ o ga„_i o • • ■ o (7„^(7ro)}) are within the i?-neighborhoods of and 1, 

3. and |ro(M)| + < 1/p for m in i?-neighborhoods of and 1 in the complex plane. 

By the general principle above, the entropy rate should be analytic on \e\ < r. 
More concretely, condition 1, 2 and 3 translate to (here p < 1): 

1. Ig'oiu)] < p, < p on < r and \u\ < R) and (|£:| < r and |1 — m| < R), 

2. max{|5(o(0) - 1|, Ifi-oll) - 1|, |5'i(0)|, |5'i(l)|} < - p) on \e\ < r (this follows from 
fl4.10|) : 1)4.111) is trivial since the stationary vector of Y doesn't depend on e), 

3. |ro(u)| + |ri(M)| < 1/p on {\e\ < r and \u\ < R) and (|£:| < r and |1 — n| < R). 

A straightforward computation shows that the following conditions guarantee conditions 
1, 2, 3: 



< ^ iZ ^IZ 7iZ Z Z , ^ 1^ , 1^ ^ iND < 



< 1 1 n [ 1 — n < V^' 



^OOTTll + VTioTTii + TTioVToi - TTioTTn 


r + |(tTooTTii + TTioTToi)!) 


TTii - 


TTlO - VTiilr - (ItToo - TTio - TTqi + 7rii|r + 


TToi - TTii|)i? 


\/'"( - TTooTTii + TTioTTii + TTioTToi - TTioVTn 


r + |(tTooTTii + TTioTToi)!) 




TTOO - TToilr - (ItToo - VTiq - TTqi + 7rii|r + 


TToi - TTiiDi? 


Vr{\- 


TTllTToo + TToiTToo + TToiTTio - TToiTToo 


r + 


ttiiTTqo - TToiTTio ) 




TTOI - VToo r - ( TToo - TTiq + TTn - TTqi T + 


TTlO - TToo )-R 


Vr{\- 


TTllTToo + TToiTToo + TToiTTio - TToiTToo r ^ 




tthttqo - TToiTTio ) 



^ V'V "ii"uu "ui"uu "ui"iu 'iui"uu ' ^ "ii'iuu "Ui"iu ; /— 

< — iZ ^IZ TiZ Z ^ „ U , I „ IND < VP: 



n V ' VI "ii"uu I "Ui"uu I "Ui"iu "Ui"uui' I i"ii"uu "Ui"iui; /— 

TTlO - |tTii - TTio|r - (|tToo - TTio + TTn " TToi|r + |7Tio - TToo|)-rt 

< < i?(i - p), < r < ^(1 - Pi 

TToi - Foo - TToi|r TTn - |7Tio - TTii|r 

< P^ r < ^(1 - P). < r < ^(1 - Pi 

TTio - I TTn - TTlO I r TTqo - |ttoi - 7Too|r 

(ItTqo - TTqi - TTlO + TTll|r + ItTqi - TTii|)i?+ jvTio - TTnjr + TTn, 
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+ (|7roi - TToo + TTio - 7rii|r + Ittoo - TTioD-R + kii - ^lok + ^10 < 

(kio - VTii - TToo + TToilr + IvTn - TToiD-R + ItToq - 7roi|r + TToi 

+ (|7rii - TTio + TToo - TToilr + Ittiq - 7roo|)-R+ IvToi - TTooIr + TToo < 1/p. 

In other words, for given p with < p < 1, choose r and R to satisfy all the constraints 
above. Then the entropy rate is an analytic function of e on \e\ < r. 

6 Relaxed Conditions 

We do not know a complete set of necessary and sufficient conditions on A and $ that 
guarantee analyticity of entropy rate. However, in this section, we show how the hypotheses 
in Theorem 1 1 . 1 1 can be relaxed and still guarantee analyticity. We then give several examples. 
In Section we do give a a complete set of necessary and sufficient conditions for a very 
special class of hidden Markov chains. 

In this section, we assume that A has a simple maximum eigenvalue 1; this implies that 
A has a unique stationary vector s. 

For a mapping / from to W and w G Wb- Let /' denote the first derivative of / at 
restricted to the subspace spanned by directions parallel to the simplex Wb and let || ■ || denote 
the Euclidean norm of a linear mapping. We say that {fa '■ a G A} is eventually contracting 
atweWb if there exists n such that for any ao, cti, ■ ■ ■ , a„ G A, \\ {fa„ o fa,^^_^ o ■ ■ • o faoYiw) \\ 
is strictly less than 1. We say that {/„ : a E A} is contracting at w E Wb if it is eventually 
contracting at w with n = 0. Using the mean value theorem, one can show that if {/a : a G A} 
is contracting at each w in a compact convex subset K of Wb then each is a contraction 
mapping on K. 

Let L denote the limit set of {(/a„ o o ■ ■ ■ o 

Theorem 6.1. If at A = A, 

1. 1 is a simple eigenvalue for A, 

2. For every a and all w in L, Taiw) > 0, 

3. For every h, {/q : a G A} is eventually contracting at all w in the convex hull of the 
intersection of L and Wb, 

then H{Z) is analytic at A = A. 

Proof. Let X denote the right infinite shift space {a^ : Oj G A}. Let Ls be the set of all 
points in W of distance at most S from L. Choose 6 so small that 

• For every a E A and w in Ls, ra{w) > - and - 

• For every b, {fa '■ a E A} is eventually contracting at all w in the convex hull of the 
intersection of Ls and Wb- 
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Since the convex hull Ks of the intersection of Ls and Wh is compact, there exists n such 
that for any ao, cti, ■ " " , «n e A and any w G Ks, \\{fa„ o f^^^^^ o ■ ■ ■ o /ao)'(w)|| is strictly less 
than 1. For simplicity, we may assume that for each a, {fa} is contracting on Ks, and so 
each fa is a contraction mapping on Ks. Since L5 C Ks, it follows that fa{Ls) C L5 and so 
each fa is a contraction mapping on Ls. 

For any G A", there exists n such that {(/c„ °/c„_i O' • -°fco){s)} G L5. Let A'^'L denote 
the cylinder set {a^ : qq = co,ai = ci, ■ ■ ■ ,a„ = c„}. Since {/„ : a G A} is a contraction 
mapping on Ls, we conclude that for any G and all m > n, {(/a„ o /a„_i o ■ ■ ■ o 
/a(,)(s)} G Ls- By the compactness of X, we can find finitely many such cylinder sets to 
cover X. Consequently we can find n such that for any G X and any m > n , we have 
{(/am ° fam-i ° ■ ■ ■ ° /ao)(^} ^ -^<5- ^^e cau uow apply the proof of Theorem ll.il - namely, we 
can use the contraction (along any symbolic sequence to extend Hn{Z) = H{Zq\ZzI) 
from real to complex and prove the uniform convergence of Hn{Z) to H{Z) in complex 
parameter space. □ 

Remark 6.2. 

(1) If A has a strictly positive column (or more generally, there is a j such that for all i, 
there exists n such that A^ > 0), then condition 1 of Theorem 16 . 1 1 holds by Perron- Frobenius 
theory. 

(2) If for each symbol a, Aq is row allowable (i.e., no row is all zero), then ra{w) > for 
all w E W and so condition 2 of Theorem 16.11 holds. 

Theorem 16.11 relaxes the positivity assumptions of Theorem ll.il Indeed given conditions 1 
and 2 of Theorem ll.il by Remark 16.21 conditions 1 and 2 of Theorem 16 . II hold. For condition 
3 of Theorem 16. H first observe that L is contained in l-ibfb{W). Using the equivalence of the 
Euclidean metric and the Hilbert metric. Proposition 12 . 21 shows that for every b, {fa : a G A} 
is eventually contracting on fb{W), which is a convex set containing the intersection of L 
and Wb. 

Theorem 16.11 also applies to many cases not covered by Theorem 11.11 For instance, sup- 
pose that some column of A is strictly positive and each Aq is row allowable. By Remark 16.21 
Theorem 16 . II applies whenever we can guarantee condition 3. For this, it is sufficient to check 
that for each a, b, fa is a contraction, with respect to the Euclidean metric, on the convex hull 
of the intersection of L with each Wb- This can be done by explicitly computing derivatives. 

Example 6.3. Consider a hidden Markov chain Z defined by : 



A = 



an 


ai2 


ai3 


ai4 




^22 


^23 


0-24 


^31 


032 


«33 


034 


0-41 


^42 


043 


CI44 



with $(1) = $(2) = and $(3) = $(4) = 1. We assume that some column of A is strictly 
positive and both Aq and Ai are row allowable. 

Parameterize Wq by {y, l — y,0, 0) and parameterize Wi by (0, 0, y,l — y) (with y E [0, 1]). 
We can explicitly compute the derivatives of /o and /i with respect to y: 

rn _ 0'llO'22 — 0-12021 

((an + ai2 - a2i - 022)1/ + 021 + 022)^ 
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fo\{0,0,y,l-y) 

fi\{yA-y, 0,0) 



^31042 — ^32041 



/ll(0. 



0,y,l-y) 



((asi + 032 


— 041 - 


- 042)2/ + ^41 + 042)^ 




^130-24 


— 014023 


((ai3 + ai4 


- ^23 - 


- ^24)2/ + ^23 + 024)^ 




033044 


— O34O43 


((033 + 034 


— 043 - 


- 044)2/ + 043 + «44)^ 



Note that the row allowabihty condition guarantees that the denominators in these expres- 
sions never vanish. 

Choose Ojj's such that each of these derivatives is less than 1; then we conclude that the 
entropy rate is analytic at A. One way to do this is to make each of the 2x2 upper/lower 
left/right matrices singular. 



Or choose the Oj/s such that 



A 




Ai 




* Pi 
a2 /?2 

* ?7i 

V2 



A 



where < ai < 02, < /3i < /52, < Ai < A2, < r^i < 772 and * denote a real positive 
number. Let (s2, S4) be the Perron eigenvalue of the stochastic matrix: 

"2 P2 
A2 V2 

Then s = (0, S2, 0, S4) is the stationary vector of A corresponding to the simple eigenvalue 1. 
Let wq = (0, 1, 0, 0) and wi = (0, 0, 0, 1). One checks that for n > 0, fa,^ o /a„_i o ■ ■ ■ o /ao (s) = 
Wa„. Therefore L consists of {wo,Wi}. Using the expressions above, we see that 

/oko ~ '^1/ '^2 < 1, /oUi = A1/A2 < 1, 

f[\wo = < 1, /iUi = vih2 < 1- 

So, /o and /i are contraction mappings at {wo.wi}, and so condition 3 holds. Thus, the 
entropy rate H[Z) is analytic at A. 



7 Hidden Markov Chains with Unambiguous Symbol 

Definition 7.1. A symbol o is called unambiguous if $^^(0) contains only one element. 

Remark 7.2. Note that unambiguous symbol is referred to as "singleton clump" in some 
ergodic theory work, such as [23] ■ 

When an unambiguous symbol is present, the entropy rate can be expressed in a simple 
way: letting oi be an unambiguous symbol. 



H{Z) = ^ p{ai,ai^_^ ■ ■ ■ ai^ai)H{z\ai^ai^_^ ■ ■ ■ o^^Oi) 



(7.21) 
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In this section, we focus on the case of a binary hidden Markov chain, in which is 
unambiguous. Then, we can rewrite ()7.2H) as 



H{Z') = /(0)//">|0) +p'{10)H'\z\10) + ■ ■ ■ +/(l(")0)iJ">|l(")0) + • • • , 
where 1*-"^ denotes the sequence of n I's and 

i7">|l(")0) = -/(0|l(")0) log/(0|l(")0) -p"^(l|l(")0)logp"^(l|l(")0). 
Example 7.3. Fix a,b, . . . , h > and for e > let 



(7.22) 



Me) 



e a — e b 
g c d 
e f 



h 



Assume a,b, . . . , h > are chosen such that A{e) is stochastic. The symbols of the Markov 
chain are the matrix indices {1, 2, 3}. Let Z"^ be the binary hidden Markov chain defined by: 
$(1) = and $(2) = $(3) = 1. We claim that H{Z^) is not analytic at e = 0. 

Let 7r{e) be the stationary vector of A{e) (which is unique since A(e) is irreducible). 
Observe that 



and for n > 1. 



c d 
e f 



n-l 



(7.23) 



/(l(")0) = m{e)ia-e,b) 
Since A{e) is irreducible, n{e) is analytic in e and positive. Now, 

p'{0)H'{z\0) = -p^(OO) log/(0|0) -p^(10)logp^(l|0). 
The first term in ()7.23|) is 

-/(OO) logp^(0|0) = -7ii{e)e\oge, 

which is not analytic (or even differentiable at e = 0). The second term in ()7.23|) is 

-/(lO) logp"(l|0) = -TT,{e){a -e + b) log(7ri(£)(a -e + b)), 

which is analytic at e = 0. Thus, H'^{z\0) is not analytic at e = 0. Similarly it can be shown 
that all of the terms of ()7.22j) . other than H'^{z\0), are analytic at e = 0. Since the matrix 

c d 
e f 

has spectral radius < 1, the terms of ()7.22j) decay exponentially; it follows that the infinite 
sum of these terms is analytic. Thus, H{Z^) is the sum of two functions of e, one of which 
is analytic and the other is not analytic at e = 0. Thus, H{Z'^) is not analytic at e = 0. 
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Example 7.4. Fix a,b,--- ,g>0 and consider the stochastic matrix 



A{e) 



e ah 
f-e c e 
g d 



The symbols of the Markov chain are the matrix indices {1, 2, 3}. Again let be the binary 
hidden Markov chain defined by $(1) = and $(2) = $(3) = 1. We show that H{Z^) is 
analytic at £ = when d, and not analytic when c — d. Note that 



and for n > 1. 



p^(l(")0) = 7ri(£)(a,6) 
When c ^ d, we assume c> d, then 



c 


e 


n— 1 ^ 





d _ 





p-(l|l(«)0) = {ad' + aec' 



c 


e 


n 





d 




■) is 


analytic 


-il 


- {d/cT 




1 - 


d/c 



l—a/c 







d" 



hd'')/{a6 



I- d/c 



I -d/c 



I- d/c 



and 



p^(0|l("^0) = ((/-£)ac"-^+^(a£d 



-2^ -{d/c 



iW— 1 



1-d/c 



fM'*-^))/(ac"-^+a£a 



1 - {d/c) 



n— 1 



1-d/c 



= ((/ - £)ac + g{ae- W^Ll. + hd{d / cy-'')) / {ac + £^ W^ll^ + hdid/cY'''). 



I- d/c 

In this case all terms are analytic. Again since 



I -d/c 



c e 
d 

has spectral radius < 1, the term p^(l(")0)if^(z|l(")0) is exponentially decaying with respect 
to n. Therefore the infinite sum of these terms is also analytic, and so the entropy rate is a 
real analytic function of e. 
When c = d, we have 

p"(l|l(")0) = (ac"+' + ae{n + l)c" + 6c"+')/(ac" + a£nc"-' + 6c") 

= (ac^ + a£(n + l)c + bc^)/ (ac + aen + be), 
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and 

p^(0|l(")0) = ((/ - £)ac" + gaenc''-^ + ghd')/{ad' + aend"-^ + 6c") 

= ((/ — e)ac + gaen + gbc)/{ac + aen + he). 

For any n, consider a small neighborhood A^„ of —(a+6)c/ an in C such that — (a+6)c/ aj G A^n 
only holds for j = n. When e —(a + b)c/an, the complexified term p^{1^^^0)H'^{z\l^"^0) 
oo. Meanwhile, the sum of all the other terms can be analytically extended to Nn (from any 
path / from a positive e to —(a + b)c/an with —(a + b)c/aj ^ / for j ^ n). Thus, by the 
uniqueness of analytic continuation of HlZ'^), we conclude that H{Z'^) blows up when one 
approaches —(a + b)c/an and therefore is not analytic at £ = (although it is smooth from 
the right at e = 0). 

The two examples above show that under certain conditions the entropy rate of a binary 
hidden Markov chain with unambiguous symbol can fail to be analytic at the boundary. We 
now show that these examples typify all the types of failures of analyticity at the boundary 
(in the case of a binary hidden Markov chains with an unambiguous symbol). 

We will need the following result. 

Lemma 7.5. Let A{e) be an analytic parameterization of complex matrices. Let A be the 
spectral radius of A{eQ). Then for any t] > 0, there exists a complex neighborhood Q of 
and positive constant C such that for all e& Q and all i,j, k 

14(^11 <C(A + #. 

Proof. Following [2Z|, we consider 

{I-zA)-^ = I + zA + z''A^ + --- . 

And 



det(/ - zA) (1 - Ai^)(l - X2Z) ■■■{!- A„z) ' 
where Ai, . . . , A„ are the eigenvalues of A. So every entry of (/ — zA)^^ takes the form: 

n CO 

{po+piz+---+p^z'^)m2^w 

j=l i=0 

00 m 

= J2J2p- E ArA----Atr;^^ 

fc=0 u=0 ii+i^-i hin=k—u 

Since the eigenvalues of a complex matrix vary continuously with entries, the lemma follows. 

□ 

Now let S{n) denote the set of all the nx n complex matrices with isolated (in modulus) 
maximum eigenvalue. 

Lemma 7.6. S{n) is connected. 
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Proof, let A,B & S{n), then we consider their Jordan forms: 

A = [/diag (Ai, C)U-\ B = Vdiag {r]i, D)V-\ 

here Ai, r^i are maximum eigenvalues for A, B, respectively, C, D correspond to other Jordan 
blocks, and U,V E GL{n, C) (here GL{n, C) denotes the set of all the n x n nonsingular 
complex matrices). Since GL{n, C) is connected ^Hl; it suffices to prove that there is a path 
in S{n) from diag (Ai, C) to diag (r^i, D). This is straightforward: first connect diag (Ai, C) to 
diag {rji, rji/XiC) by a continuous rescaling; then connect r/i/AiC to D by the path trji/XiC + 
(1 — t)D (the path diag (r/i, trji/XiC + (1 — t)D) stays within S{n) since the matrices along 
this path are upper triangular with all diagonal entries, except rji, of modulus less than 
\Vi\)- □ 

For a complex analytic function f{zi,Z2,--- ,Zn), let V{f) denote the "hypersurface" 
defined by /, namely 

V{f) = {(Zi, Z2,---, Zn) e C" : /(Zi, Z2, ■ ■ ■ , Zn) = 0}. 

Now let VL denote a connected open set in C". It is well known that the following Lemma 
holds (for completeness, we include a brief proof). 

Lemma 7.7. VL\V{f) is connected. 

Proof. For simplicity, we first assume is a ball -Br (2^0) (here zq G C" is the center of the 
ball and r is the radius, i.e., Br{zo) = {z G : I2; — zq\ < r}) in C". For any two distinct 
point P,Q E Q\V{f), consider the "complex line" 

= {zP + (l-z)Q: ze C}. 

L^^ nV{f)nfl consists of only isolated points (A non-constant one variable complex analytic 
function must have isolated zeros in the complex plane j2H|)- It then follows that for the 
compact real line segment: 

4« = {tP+(l-t)Q:tG[0,l]}, 

n V{f) n Q consists of only finitely many points. Certainly one can choose an arc in 
Lj.*^ n f2 to avoid these points and connect P and Q. This implies that Q\V{f) is connected. 

In the general case, f2 is a connected open set in C". Let / be an arc in Q connecting P and 
Q, and let {Brj{zj)} be a collection of balls covering / such that each Br^i^Zj) n5r-j^i(%+i) 7^ 
0. Pick a point Pj in Br^{zj) fl Br^^^{zj+i) such that Pj G Vl\V{f). Applying the same 
argument as above to every ball Br {zj), we see that P is connected to Q in Q\V{f) through 
the points -P/s. Thus we prove the lemma. □ 



Theorem 7.8. Let A be an irreducible stochastic d x d matrix. Write A in the form: 

(7.24) 



a r 
c B 



where a is a scalar and B is a [d — 1) x [d — 1) matrix. Let $ be the function defined 
by ^{1) = 0, and $(2) = ■ ■ ■ = ^{d) = 1. Then for any parametrization A{e) such that 
A{6o) = A, letting Z"^ denote the hidden Markov chain defined by A{e) and $, H{Z^ is 
analytic at Eq if and only if 
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1. a > 0, and rB^c > for j = 0,1, ■ ■ ■ . 

2. The maximum eigenvalue of B is simple and strictly greater in absolute value than the 
other eigenvalues of B. 

Proof. Proof of sufficiency. 

We write 

a{e) r{e) 



cie) Bis) 



(7.25) 



where a{e) is a scalar and B{e) is a {d — 1) x [d — 1) matrix. 

Since A(eo) is stochastic and irreducible, its spectral radius is 1, and 1 is a simple 
eigenvalue of A. Thus, if Q is sufficiently small, for all e ^ Q, any fixed row 7r(e) = 
{ni{e) , H2{e) , ■ ■ ■ , vrrf(e)) of Adj{I — A{e)) is a left eigenvector of A{e) associated with eigen- 
value 1 and is an analytic function of e. Normalizing, we can assume that vr(e)l = 1, 7i{e) 
is analytic in e, and vr(eo) > 0. 

The entries of r{6),B{e), and c{e) are real analytic in e and can be extended to complex 
analytic functions in a complex neighborhood Q of Eq. Thus, for all n, 7Ti{e)r{e)B[e)"'^^l 
and 'n'i{e)r{e)B{e)'^~^c{e) can be extended to complex analytic functions on Q (in fact, each 
of these functions is a polynomial in e). 

Since B{eo) is a proper sub-matrix of the irreducible stochastic matrix A{eo), its spectral 
radius is strictly less than 1. Thus, by Lemma (7. 5| there exists < A* < 1 and a constant 
Ci > 0, such that for some complex neighborhood Q of Eq, all e & Q, and all n, 

|Si;.(e)|<Ci(AT. 

Since iiiie), r{e) and c{e) are continuous in e, there is a constant C2 > such that for all 
e&Q and all n: 

|7ri(£)r(£)5(e)"l| < C2(A*)". (7.26) 
We will need the following result, proven in Appendix iBl 
Lemma 7.9. Let 

,^ 7ri(£)r(e)5(e)"l 



and 

b{e, n 



7ri(5)r(e)5(e)"-il 
7ri(£)r(e)5(e)"-ic(e) 



'Ki{e)r{e)B{e)''-n ' 

For a sufficiently small neighborhood Vt of Eq, both a{e, n) and b{e, n) are bounded from above 
and away from zero, uniformly in e eVL and n. 

Define 

Hn = —a{e, n) log a{e, n) — b{e, n) log b{e, n), 

where a{e,n) and b{e,n) are as in Lemma f7. 91 Choosing f2 to be a smaller neighborhood of 
Eq, if necessary, a{e,n) and b{e,n) are constrained to lie in a closed disk not containing 0. 
Thus for all n, is an analytic function of e, with \H^\ bounded uniformly in e& Q and n. 
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Since 'n'i{e)r[e)B(e)"' ^1 is analytic on Q and exponentially decaying (by ()7.26p ). the infinite 
series 

H'^Z) = ix^{e)Hl+ Ti^{e)r{e)lHl + ■ ■ ■ + T:^{e)r{e)B{e)''-HHi+ ■ ■ ■ (7.27) 

converges uniformly on Vt and thus defines an analytic function on Vt. 
Note that for e > 0, 

/(l(")0) = 7ri(£)r(e)5(e)"-4 (7.28) 

and 

/(0l(")0) = Tii{e)r{e)B{e)''-^c{e). (7.29) 

By ()7.28|) . ()7.29|) . and the expression for entropy rate in the case of an unambiguous symbol 
(given at the beginning of this section), H^{Z) agrees with the entropy rate when A(£) > 0, 
as desired. 

Remark 7.10. We show how sufficiency relates to Theorem Ifj.ll Namely, the assumptions 
in Theorem 17.81 imply those of Theorem Ifj.ll Condition 1 of Theorem Ifi.ll follows from the 
fact that A is assumed irreducible. For conditions 2 and 3 of Theorem 16. ![ one first notes 
that the image of /o is a single point Wq, and the /i -orbit of Wq and /i-orbit of s converge to 
a point pi. It follows that L is the union of Wq, the /i-orbit of Wq and pi. The assumptions 
in Theorem 17.81 imply that > on L (i.e., condition 2 of Theorem 16.11 holds) and that 
for sufficiently large n, the n-fold composition of /i is contracting on the convex hull of the 
intersection of L and Wi (so condition 3 of Theorem 16 . 1 1 holds) . To see the latter, one uses 
the ideas in the proof of sufficiency. 

Proof of necessity 

We first consider condition 2. We shall use the natural parameterization and view H{Z) 
as a function of A, or more precisely of (S, r). Note that there is a one-to-one correspondence 
between A and {B,r); we shall use this correspondence throughout the proof. 

Suppose A doesn't satisfy condition 2, however H{Z) is analytic at A with respect to the 
natural parameterization. In other words, suppose there exists a complex neighborhood 
of A (here A^a corresponds to Nb x A^^ where A''^ is neighborhood of B and Nr is neighborhood 
of r) such that H{Z) can be analytically extended to A^a, while the corresponding B doesn't 
have isolated (in modulus) maximum eigenvalue. 

We first claim there exists A G A^a with rB'^l = 0, here f and B correspond to A and 
B has distinct eigenvalues (in modulus). Indeed we can first (for simplicity) perturb A to A 
such that the corresponding B has distinct eigenvalues in modulus. Then 

5 = [/diag(Ai,A2,--- ,A,_i)t/-i 

= {vi,V2,--- ,5d-i)diag(Ai,A2,-- - ,Xd-i)iw\,wl,--- ,Wd-iY 

where |Ai| > IA2I > • ■ ■ > |Arf_i|, and Vi,WiS are appropriately scaled right and left eigen- 
vectors of B, respectively. Then we have 

rB''l = rviWilX'l + rv2W2'^^ H h rVd-iWd-i^^-v 
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Further consider a perturbation of B from 



fi = t/diag(Ai,A2,--- ,Ad-i)t/~' 

to 

B = VUdiag (Ai, A2, ■ ■ ■ , \d-i)U-'V-\ 

where \^ is a complex matrix close to the {d — 1) x [d — 1) identity matrix Id-i- So we can 
pick V such that ViWiV~^l 7^ 0, ViWiV~^c 7^ 0, V2W2V~^1 7^ 0. Clearly viWiV^^l is not 
proportional to V2W2V~^1. Then by a further perturbation of r to f, we can simultaneously 
require that fviWil 7^ 0, rviWiC 7^ 0, fv2W2l 7^ 0, |ft;iWil| 7^ |f{;2W2l|, where we redefine 
•Vi = Vvi and Wi = WiV~^. For any 6 and rj > 0, it can be checked that 

00 

[j{z':\z-e^'\<v} = C\{0}. 

k=0 

Since A2 is a perturbation of Ai, it follows that for large enough k, one can perturb A2 to 
satisfy the equation 

^fe _ -f^iWil - f{;3W3l(A3/Ai)^ f{;rf_iWd_il(Ad_i/Ai)* 

^2/ ^1 



rV2W2l 

with IA2I 7^ I All and IA2I strictly greater than |Aj| for j > 3. Thus we prove the claim. 

We now pick a positive matrix A G A^a with corresponding f and B. We then pick 
A G A^A with corresponding f and B (with distinct eigenvalues in modulus) such that 
rB^^l = for some ki, and we can further require that rviWil 7^ 0, rviWiC 7^ (see the 
proof for the previous claim), where as before, vi,wi are eigenvectors corresponding to the 
largest eigenvalue of B. According to Lemma f7.fj| there is an arc Ii C S{d — 1) connecting 
B to B; we then connect f and f using an arc I2 in C^"^. According to Lemma \7. 7\ we can 
choose the arc / = (/i,/2) to avoid the hypersurface V{{rviWil){rviWic)) C C*^'^"^-' x C^"^; 
in other words, we can assume that along the path J, rviWil 7^ and rviWic 7^ 0; here 
Vi,Wi,c are determined by the variable matrix B along the path Ji and r is the variable 
point along path I2 (we remind the reader that the coordinates of Vi and Wi are all analytic 
functions of the entries of B). We then claim that there is a neighborhood Nj of I such that 
Vfc n Nj 7^ (j) and Wk H Nj 7^ (j) hold for only finitely many k, where 14 = {{B,r) : rB'^l = 0} 
and Wk = {{B, r) : rB^c = 0}. Indeed for any A G / with corresponding B G S{d — 1), by 
the Jordan form we have 

rE'^l = rviWilAt + o(At), 

where Ai is the isolated maximum eigenvalue and vi,wi are appropriately scaled right and 
left eigenvectors of B, respectively. Since rviWil 7^ on /, there exists a complex connected 
neighborhood Nj of I such that rviWil ^ on Nj and rviWilXi dominates uniformly on 
Nj (see Lemma l7.5|) . Consequently, |ri?*^l| > on Nj for large enough k. In other words, 
Vk r\ Nj ^ (j) holds for only finitely many k. Similarly since rviWiC 7^ on /, there exists 
a complex neighborhood Nj of / (here we use the same notation for a possibly different 
neighborhood) such that Wk (1 Nj ^ (p holds only for finitely many k. From now on, we 
assume such fc's are less than some K, which depends on Nj. 
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We claim that we can further choose I and find a new neighborhood Ni in C^^^ x S{d — 1) 
of / such that fl A?^/ 7^ holds only for k = ki and (1 Nj = (p for all k. Consider A 
with corresponding B, let Fi = Fi{B) = {r : rB^l = 0}, which is a hyperplane orthogonal 
to the vector B^l in C^"^. Similarly we define Gi = Gi{B) = {r : rB^c = 0}. Recall 
that B = f/diag (Ai, A2, ■ ■ ■ ,Xd-i)U~^; we can require that U~^l has no zero coordinates 
by a small perturbation of U if necessary. We then show that Fj's and G^'s define different 
hyperplanes in C'^~^ Indeed suppose Fi = Fj. It follows that f/diag (A*^, Ag, ■ ■ ■ , A^_]^)f/"^1 
is proportional to t/diag (A{, A2, ■ ■ ■ , \^ll_-^)U^^l. It then follows that - , A^__]^) is 

proportional to (A^, A2, ■ ■ ■ , ^d-i)- However since not all eigenvalues have the same modulus, 
this implies that i = j. With a perturbation of c (equivalently a perturbation of row sums 
of B), if necessary, we conclude that the Fj's and Gj's determine different hyperplanes, i.e., 
Fi 7^ Fj, Gi 7^ Gj for i ^ j < K, and Fj 7^ for all Thus, with a perturbation of f if 
necessary, we can choose a new A contained in Vk^ , but not contained in any Vk with k ^ ki 
or IVfc for all k. Again by Lemma 17. 7^ one can choose a new / inside original Nj, connecting 
A and A, to avoid all V^'s and H4's except V^^, then choose a smaller new neighborhood Ni 
of the new / to make sure that H iVj 7^ only holds for k = ki and Wk nNj = (j) for all k. 

Since the perturbed complex matrix B still has spectral radius strictly less than 1, all the 
complexified terms in the entropy rate formula (see ()7.27p ) with k ^ ki are exponentially 
decaying and thus sum up to an analytic function on Ni.{i.e., the sum of these terms can be 
analytically continued to Ni), while the unique analytic extension of the fci-th term on Ni 
blows up as one approaches Vk^ fl Ni from A. Again by the uniqueness of analytic extension 
of H{Z) on Nj, this would be a contradiction to the assumption that H{Z) is analytic at 
A (here we are applying the uniqueness theorem of analytic continuation of a function of 
several complex variables, see page 21 in |2H1)- Thus we prove the necessity of condition 2. 

We now consider condition 1. Suppose A doesn't satisfies condition 1, namely a = or 
rB'^c = for some k, however H{Z) is analytic at A. With the proof above for the necessity 
of condition 2, we can now assume the corresponding B G S{d — 1). 

If a = 0, consider any perturbation of A to Ai such that B G S{d — 1), rviWil 7^ 0, 
fviWiC 7^ 0, fB^l 7^ and fB^c 7^ for all k (here we follow the notation as in the proof 
of necessity of condition 2). Then using similar arguments, we can prove the sum of all 
the terms except the first term in the entropy rate formula (see (j7.27|) ) can be analytically 
extended to A. However this implies that a log a is a well-defined analytic function on some 
neighborhood of in C, which is a contradiction. Similar arguments can be applied to the 
case that rB^c = for some /c's. Thus we prove the necessity of condition 1. □ 

8 Analyticity of a Hidden Markov Chain in a Strong 
Sense 

In this section, we show that if A is analytically parameterized by a real variable vector e, 
and at Eq, A satisfies conditions 1 and 2 of Theorem ll.il then the hidden Markov chain itself 
is a real analytic function of e at 5o in a strong sense. We assume (for this section only) that 
the reader is familiar with the basics of measure theory and functional analysis fW\ I3H IT7j. 
Our approach uses a connection between the entropy rate of a hidden Markov chain and 
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symbolic dynamics explored in |15j . 

Let X denote the set of left infinite sequences with finite alphabet. A cylinder set is a 
set of the form: {{x^^ : xq = zq, ■ ■ ■ , X-n = z^n})- The Borel sigma-algebra is the smallest 
sigma-algebra containing the cylinder sets. A Borel probability measure (BPM) z/ on A' is a 
measure on the Borel measurable sets of X such that = 1. Such a measure is uniquely 
determined by its values on the cylinder sets. 

For real e, consider the measure z/^ on X defined by: 



Usually, the Borel sigma-algebra is defined to be the smallest sigma-algebra containing 
the open sets; in this case, the open sets are defined by the metric: for any two elements C, 
and 7] in X, define d{^,7]) = where k = inf{|z| : 7^ rji}. The metric space {X,d) is 
compact. 

Let C{X) be the space of real- valued continuous functions on X. Then C{X) is a Banach 
space (i.e., complete normed linear space) with the sup norm ||/||oo = sup{|/(a;)| : x G X}. 
Then any BPM u acts as a bounded linear functional on C{X), namely z/(/) = J fdv. 
As such, the set of BPM's is a subset of the dual space, C{X)*, which is itself a Banach 
space; the norm of a BPM z/ is defined: ||z/|| = s\x]iyf^(j(^x):\\f\\^=i} I f^^- fs^ct, since X is 
compact, C{X)* is the linear span of the BPM's. 

It makes sense to ask if e ^ is analytic as a mapping from the parameter space 
to C{X)*] by definition, this would mean that z/^ can be expressed as a power series in 
the coordinates of e. However, as the following example shows, this mapping is not even 
continuous. 

Let X be the set of binary left infinite sequences. Let Up denote the i.i.d. (p, I — p) 
measure, with < p < 1. Let 

Sp = {x e X : lim (l/n)(logp:i., + . . . logp^_„) = -plogp - {1 - p) log(l - p)}. 

n—*oo 

Note that Sp is a Borel measurable set. By the strong law of large numbers, i^p{Sp) = 1. 
Clearly, for distinct p, Sp are disjoint. Thus, for q ^ p, i^q{Sp) = 0. 

Any Borel measurable set S can be approximated by a finite union of cylinder sets in the 
following sense: given 6 > and p G (0, 1), there is a finite union C of cylinder sets such 
that \i^q{S) — z/q(C)| < 6 for all g in a neighborhood of p. Applying this fact to S = Sp, and 
denoting C(^p^s) = C, we obtain 

1 = Up{Sp) - Ug{Sp) < \upiSp) - z/p(C(p,5))| + Wp{C(^p,s)) - MC(^p,s))\ + \MC{p,s)) - I'qiSp)] 

<25+ |z/p(C(p,5)) - z^g(C(p,5))|. 

If 6 < 1/2, then z/g(C(p^5)) cannot converge to i^p{C(p^s)) as g — * p. Since the characteristic 
function of a finite union of cylinder sets is continuous, this shows that the map p ^ Vp from 
TZ to C{X)* is discontinuous. 



Z/^({X_^ ■.XQ = ZQ,---,X-n = 



Z.n})=f{z\). 



(8.30) 



Note that H{Z) can be rewritten as 




(8.31) 
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On the other hand, using the work of Ruelle [21], we now show that e^u'^is analytic 
as a mapping from the parameter space to another natural space. 

For / e C{X), define var^if) = sup{|/(0 - : = C-i for ^ < n}. We denote by 

the subset of / G C{X) such that 

11/11, = sup(r"mr„(/))< +00. 

n>0 

is a Banach space with the norm ||/|| = max(|/|^, Using complex functions 

instead of real functions, one defines F^ similarly. 

In the following theorem, we prove the analyticity of a hidden Markov chain in a strong 
sense. 

Theorem 8.1. Suppose that the entries of A are analytically parameterized by a real variable 
vector e. If at e = Eq, A satisfies conditions 1 and 2 in Theorem then the mapping 
e (— > logp^(zok-i,) is analytic at Eq from the real parameter space to F^ (here p is the 
contraction constant in the proof of Theorem M . 1\) . Moreover the mapping e^v^is analytic 
at Sq from the real parameter space to {F'')* , the dual space (i.e., bounded linear functional) 
on FP. 

Proof. For complex e, by ()4.16p . one shows that \ogp'^{zQ\zZla) can be defined on fl^ as the 
uniform (in e and z & X) limit of \ogp'^{zQ\zZn) as n — > 00, and \ogp{zQ\zZlo) belongs to F^. 
By flI3|l . and it follows that p^{zo\zZn) is analytic on fie- As a result of 

(I4.16p . if A satisfies conditions 1 and 2, for fixed z E X, logp'^(zo|^Z(^) is the uniform limit 
of analytic functions and hence is analytic on (see Theorem 2.4.1 of |29j). 

Using ()4.16p and the Cauchy integral formula in several variables [211 (which expresses 
the derivative of an analytic function at a point as an integral of a closed curve around 
the point), we obtain the following. There is a positive constant C such that whenever 
2;° 00 ~ foi' all eeQc 

\D,i\ogp%zo\zZl)) - D,{\og/{zo\zZ'J)\ < C p\ (8.32) 

Therefore for arbitrary yet fixed z^_^, the components of the derivatives of \ogp^ {zq\zZ\o) 
with respect to e are also in F^. 

Furthermore, we prove that the mapping e ^— > \ogp'^[zo\zZ]^) is complex differentiable 
(therefore analytic) from f^c to F^^. Let f{s;-) = logp'^(-). It suffices to prove that 

\\f{e + h- ■) - /(5; ■) - D,f\,{h; OIL < oih). (8.33) 

and 

11/(5 + h- ■) - fie; ■) - D^flS; ■)\\e< oih). (8.34) 

Again applying the Cauchy integral formula in several variables, it follows that there 
exists a positive constant C" such that for all e G fic we have 

\Dlf\,iKh;z)\<C''\h\'' (8.35) 

and whenever 2;°^ ~ -^-oo' 

fil - t)\iDlf\,ih, h; z) - DlfUih, h; z))\dt < C"\h\'p\ (8.36) 
Jo 
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From the Taylor formula with integral remainder, we have: 

f{e + h- z) - f{e- z) - Dgf\g{h- z) = f (1 - t)Z}|/|^-.^,^(/^, h- z)dt. (8.37) 

To prove use and (lOTjl . To prove use and (jOTj) . Therefore e ^ 

logp'^(-) is analytic as a mapping from Vt^ to F^. Restricting the mapping e ^ logp'^ {zq\zzI^) 
to the real parameter space, we conclude that it is real analytic (as a mapping into F^). Using 
this and the theory of equilibrium states j21]), the "Moreover" is proven in Appendix O CH 

Corollary 8.2. Suppose that at Eq, A satisfies conditions 1 and 2 in Theorem li.il and 
e ^ E F'' be analytic at Eq, then e i— //^(/"^ is analytic at £*o- particular, we recover 
Theorem \l.l\ e ^ H'^{Z) is analytic at Eq. 

Proof. The map 

n-^F^x {Ffy R 

is analytic at as desired. □ 
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Appendices 

A Proof of Proposition 12.11 

Proof. Without loss of generality, we assume S is convex (otherwise consider the convex hull 
of S). It follows from standard arguments that max norm and sum norm are equivalent. 
More specifically, for another metric di defined by 



di{u, v) 

we have rfe ~ c^i- For metric d2 defined by 



Uj/Uj 
X Vi/Vj 



d2{u,v) = {Ui/Uj - Vi/Vj)^. 

Applying mean value theorem to log function, one concludes that di ^ d2- Note that 

Ui Vi 

Ui - Vi 



Ui + U2-\ \-Uk Vi+V2-\ \-Vk 

1 1 



Ui/Ui + U2/Ui-\ h Uk/Ui Vi/Vi +V2/Vi-\ h Vk/Vi 



25 



Applying the mean value theorem to function /, defined as 

1 



f{Xi,X2, ■■■ ,Xb) 



Xi+ X2 + \- Xk 

we conclude that there exists ^ G 5 such that 

Ui-Vi= ■ {Ui/Ui - Vi/Vi, ■ ■ ■ ,Uk/Ui - Vk/Vi). 

It follows from Cauchy inequality that there exists a positive constant Di such that 

dE{u,v) < Did2{u,v). 

Similarly consider Ui/uj — Vi/vj, and apply mean value theorem to function g, defined as 
g{x,y) = x/y, we show that there exists a positive constant D2 such that 

d2{u,v) < D2d^{u,v). 

Namely d2 ~ ds- Thus the claim in this Proposition follows, namely there exist two positive 
constant Ci < C2 such that for any two points u,v E S, 



CidB{u,v) < d-E,{u,v) < C2d-Q{u,v). 



□ 



B Proof of Lemma 17.91 : 

Recall that for a non-negative matrix the canonical form of B is: 



B 



Bu Bi2 
B22 





Bin 
B2n 



where Bu is either an irreducible matrix (called irreducible components) or a 1 x 1 zero 
matrix. 

Condition 2 in Theorem 17. 81 is equivalent to the statement that B = B{eQ) has a unique 
irreducible component of maximal spectral radius and that this component is primitive. Let 
C denote the square matrix obtained by restricting B to this component and let Sc denote 
the set of indices corresponding to this component. Let Ai denote the spectral radius of B, 
equivalently the spectral radius of C. 

Let Xi{e) denote the largest, in modulus, eigenvalue of B{e). Since the entries of B{e) 
are analytic in e and Ai is simple, it follows that if the complex neighborhood Q is chosen 
sufficiently small, then Xi{e) is analytic function of eE Q. 

The columns (resp., rows) of Adj{Xi{e)I — B{£)) are right (resp., left) eigenvectors of 
B{e) corresponding to Xi{e). By choosing x{e) (resp. y{e)) to be a fixed column (resp. row) 
of Adi{Xi{e)I — B{e)) and then replacing x{e) and y{e) by appropriately rescaled versions, 
we may assume that: 
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• x{eo),y{eo) > 0, and they are positive on Sc 

• y{e) = 1 

• x{e) and y{e) are analytic in e & Q 
Let 

and 

U{s-)=Bis-)-V{s-). 

Then V{e) is the restriction of B{e) to the subspace corresponding to \i{e) and U{e) is 
the restriction to the subspace corresponding to the remainder of the spectrum oi B{e). It 
follows that 

U{e)V{e) = = V{e)U{e). 

Let fj,{e) denote the spectral radius of U{e). By condition 2, n{eo) < \i{eo). Thus, there 
is a constant u > such that if the neigbourhood fl is sufficiently small, then for all e& Q 

li{e)<u< \\i{e)\. 

Thus, by Lemma 17.51 and making still Vt smaller if necessary, there is a constant Ki > Q 
such that for all z, j, all n and all £ G f2, 

\U^^{e)\ < K,y\ (B.38) 

Let r = r{eo), c = c{eo), x = x{eo) and y = y{so). In the following we will show that 
the irreducibility of A will rule out the possibility that c is non-zero only in non-maximal 
spectral radius irreducible components of B, and so we can extend a{e, n) and h{e, n) from 
real to complex. 

Let So G Sc- Since A(eo) is irreducible and r is nonnegative, but not the zero vector, 
for some jo, {f'B^°)sg > 0. Similarly, for any index si other than 1 of the underlying Markov 
chain, there exists ji such that -B^Js-^ > 0. Choose si to be any index such that Cs^ > 0. Since 
C is primitive, it then follows that there is a constant K2 such that for sufficiently large n, 

rx ■ ycWl + rU'^c = rVc + rU''c = rB''c > 

which by ()B.38|1 implies that rx ■ yc > 0. Therefore if fl is sufficiently small, there exists a 
positive constant K4 such that 

\r{6)x{6) •y(£)c(£)| > K^, 

for £ G fi. 

Let K3 be an upper bound on the entries of and \c{e)\. 

Thus, for all n and all e*G fi, we have 

|r(£)5"(£)c(£)| < |r(£)f/"(£)c(£)| + \r{e)V''{e)c{e)\ < IBl^K^Kiu'' + \B\^K^,\Xi{e)r 
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and 



|r(£)fi"(e)c(e)| > |r(£)V"(e)c(e)| - |r(£)f/"(£)c(e)| > Ki\\i{e)\'' - IBI^KIK^u". 

With similar upper and lower bounds for \{r{e)B"'{e)l\, it follows that for sufficiently large 
n and all e&Q, 

7ri(e)r(£)E(£)"-il 

and 

7ri{e)r{e)B{e)''-^c{e) 
7ri(e)r(£)5(£)"-il 

are uniformly bounded from above and away from zero. By condition 1, for any finite 
collection of n, there is a (possibly smaller) neighborhood Q of Eq, such that for all e & Q, 
these quantities are uniformly bounded from above and away from zero. This completes the 
proof of Lemma 17.91 ( and therefore the proof of sufficiency for Theorem 17.81 ) 

C e\-^iy^is analytic 

In this appendix, we follow the notation in Section |H| Let t : X —>■ X he the right shift 
operator, which is a continuous mapping on X under the topology induced by the metric d. 
For / G C{X), one defines the pressure via a variational principle 

Pif) = sup (n^ir) + [ fdfx) , 

where M{X, r) denotes the set of r-invariant probability measures on X and H^{t) denotes 
measure-theoretic entropy. A member yU of M{X, r) is called an equilibrium state for / if 
Pif) = H,{T) + Jfd^^. 

For / e C{X) the Ruelle operator Cf : C{X) C{X) is defined ^ by 

iCjh){x)= J2 e^^'^Ky)- 

The connection between pressure and the Ruelle operator is as follows |24[ EH! ■ When 
/ G F^, P{f) is log A, where A is the spectral radius of Cf. The restriction of £/ to still 
has spectral radius A, and A is isolated from all other eigenvalues of the restricted operator. 
Using this, Ruelle apphed standard perturbation theory for linear operators |JJJ to conclude 
that pressure P{f) is real analytic on F^. Moreover, he showed that each f & F^ has a 
unique equihbrium state fif and the first order derivative of / i-^ P{f) on F^ is /i/, viewed 
as a linear functional on F^. So, the analyticity of P{f) implies that the equilibrium state 
^/ is also analytic in / G F^. 

We first claim that for f{e,z) = \ogp'^{zo\zZlo), we have = i^'^ as in ()8.3fl|l . 

To see this, first observe that the spectral radius A of £ = >C/(^,.) is 1; this follows from 
the observations: 

• the function 1 which is identically 1 on A" is a fixed point of £ - and - 
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• (see Proposition 5.16 of j21I) £"'(1)/A"' converges to a strictly positive function. 
Thus P{f{e, ■)) = 0. So, for /x"^ = we have 

V(r) + j f{e,W=^- 

But from ()8.H1|1 . we have 

By uniqueness of the equihbrium state, we thus obtain /^/(e,.) = z^^ as claimed. 

Since e ^— > f{e, ■) is analytic, it then follows that e^u'^is analytic, thereby completing 
the proof of Theorem 18.11 



References 

L. Arnold, V. M. Gundlach and L. Demetrius. Evolutionary formalism for products of 
positive random matrices. Annals of Applied Probability, 4:859-901, 1994. 

J. J. Birch. Approximations for the entropy for functions of Markov chains. Ann. Math. 
Statist, 33:930-938, 1962. 

D. Blackwell. The entropy of functions of finite-state Markov chains. Trans. First 
Prague Conf. Information Thoery, Statistical Decision Functions, Random Processes, 
pages 13-20, 1957. 

M. Cassandro and E. Olivieri. Renormalization group and analyticity in one dimension: 
A proof of Dobrushin's theorem Commun. Math. Phys., 80, 255-269, 1981. 

J. R. Chazottes and E. Ugalde. Projection of Markov measures may be Gibbsian. J. 
Statist. Phys., Volume 111, Numbers 5-6, 1245-1272. 

R. L. Dobrushin. Analyticity of correlation functions in one-dimensional classical sys- 
tems with slowly decreasing potentials. Commun. Math. Phys. 32, 269-289, 1973. 

S. Egner, V. Balakirsky, L. Tolhuizen, S. Baggen and H. Hollmann. On the entropy rate 
of a hidden Markov model. In Proceedings of the 2004 IEEE International Symposium 
on Information Theory, page 12, Chicago, U.S.A., 2004. 

G. Han and B. Marcus. Analyticity of entropy rate of a hidden Markov chain In Proc. of 
IEEE International Symposium on Information Theory, Adelaide, Australia, September 
4-September 9 2005, pages 2193-2197. 

R. Gharavi and V. Anantharam. An upper bound for the largest Lyapunov exponent 
of a Markovian product of nonnegative matrices. Preprint, Janurary 1995. 

T. Holliday, A. Goldsmith and P. Glynn. On entropy and Lya- 

punov exponents for finite state channels. 2003. Available at 
http : / / wsl . stanford.edu / Publications/THoUiday / Lyapunov. pdf. 



29 



[11] p. Jacquet, G. seroussi and W. Szpankowski. On the entropy of a hidden Markov 
process. In Proceedings of the 2004 IEEE International Symposium on Information 
Theory, page 10, Chicago, U.S.A., 2004. 

[12] T. Kato. Perturbation Theory for Linear Operators. Springer Verlag, Berhn- Heidelberg- 
New York, 1976. 

[13] D. Lind and B. Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge 
University Press, 1995. 

[14] J. Lorinczi, C. Maes and K. V. Velde. Transformations of Gibbs measures. Probab. 
Theory Relat. Fields, Volume 112, 121-147, 1998. 

[15] B. Marcus, K. Petersen and S. Williams. Transmission rates and factors of Markov 
chains. Contemporary Mathematics, 26:279-294, 1984. 

[16] A. Mukherjea and K. Pothoven. Real and functional analysis. Plenum Press, New York, 
1978. 

[17] L. Nachbin. Introduction to functional analysis : Banach spaces and differential calculus. 
New York : M. Dekker, 1981. 

[18] A. Onishchik. Lie groups and Lie algebra I. Encyclopaedia of mathematical sciences ; 
V. 20. Springer- Verlag, 1993. 

[19] E. Ordentlich and T. Weissman. On the optimahty of symbol by symbol filtering and 
denoising. Information Theory, IEEE Transactions, Volume 52, Issue 1, Jan. 2006 
Page(s):19 - 40. 

[20] E. Ordentlich and T. Weissman. New bounds on the entropy rate of hidden Markov 
process. Information Theory Workshop, 2004. IEEE 24-29 Oct. 2004 Page(s):117 - 122 

[21] Y. Peres. Analytic dependence of Lyapunov exponents on transition probabilities, vol- 
ume 1486 of Lecture Notes in Mathematics, Lyapunov 's exponents. Proceedings of a 
Workshop. Springer Verlag, 1990. 

[22] Y. Peres. Domains of analytic continuation for the top Lyapunov exponent. Ann. Inst. 
H. Poincare Probab. Statist, 28(1):131-148, 1992. 

[23] K. Petersen, A. Quas and S. Shin. Measures of maximal relative entropy. Ergod. Th. 
and Dynam. Sys., 23, 207-223, 2003 

[24] D. Ruelle. Thermodynamic formalism : the mathematical structures of classical equilib- 
rium statistical mechanics. Addison- Wesley Pub. Co., Advanced Book Program, Read- 
ing, Mass, 1978. 

[25] D. Ruelle. Analyticity properties of the characteristic exponents of random matrix 
products. Adv. Math., 32:68-80, 1979. 

[26] D. Ruelle. Differentiation of SRB states. Comm. Math. Phys., 187(1):227-241, 1997. 



30 



[27] E. Seneta. Springer Series in Statistics. Non-negative Matrices and Markov Chains. 
Springer- Verlag, New York Heidelberg Berlin, 1980. 

[28] B. V. Shabat. Introduction to complex analysis. Translations of mathematical mono- 
graphs ; V. 110. American Mathematical Society, Providence, R.I., 1992. 

[29] J. L. Taylor. Several complex variables with connections to algebraic geometry and Lie 
groups. American Mathematical Society, Providence, R.I., 2002. 

[30] P. Walters. An introduction to ergodic theory, volume 79 of Graduate texts in mathe- 
matics. Springer- Verlag, New York, 1982. 

[31] K. Yosida. Functional analysis, 4th edition. Springer- Verlag, Berlin, 1974. 

[32] O. Zuk, I. Kanter and E. Domany. Asymptotics of the entropy rate for a hidden Markov 
process. J. Stat. Phys., 121(3-4): 343-360 (2005) 

[33] O. Zuk, E. Domany, I. Kanter, and M. Aizenman. Taylor series expansions for the 
entropy rate of Hidden Markov Processes. ICC 2006, Istanbul. 



31 



